Sparse LU Decomposition using FPGA

نویسندگان

  • Jeremy Johnson
  • Timothy Chagnon
  • Petya Vachranukunkiet
  • Prawat Nagvajara
  • Chika Nwankpa
چکیده

This paper reports on an FPGA implementation of sparse LU decomposition. The resulting special purpose hardware is geared towards power system problems load flow computation which are typically solved iteratively using Newton Raphson. The key step in this process, which takes approximately 85% of the computation time, is the solution of sparse linear systems arising from the Jacobian matrices that occur in each iteration of Newton Raphson. Current state-of-the-art software packages, such as UMFPACK and SuperLU, running on general purpose processors perform suboptimally on these problems due to poor utilization of the floating point hardware (typically 1 to 4% efficiency). Our LU hardware, using a special purpose data path and cache, designed to keep the floating point hardware busy, achieves an efficiency of 60% and higher. This improved efficiency provides an order of magnitude speedup when compared to a software solution using UMFPACK running on general purpose processors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Criticality-driven Token Dataflow Optimizations for FPGA-based Sparse LU Factorization

Performance of FPGA-based token dataflow architectures is often limited by the long tail distribution of parallelism in the compute paths of dataflow graphs. This is known to limit speedup of dataflow processing of Sparse LU factorization to only 3– 10× over CPUs. In this paper, we show how to overcome these limitations by exploiting criticality information along compute paths; both statically ...

متن کامل

Criticality-driven Token Dataflow Optimizations for FPGA-based Sparse LU Factorization

Performance of FPGA-based token dataflow architectures is often limited by the long tail distribution of parallelism in the compute paths of dataflow graphs. This is known to limit speedup of dataflow processing of Sparse LU factorization to only 3– 10× over CPUs. In this paper, we show how to overcome these limitations by exploiting criticality information along compute paths; both statically ...

متن کامل

Parallel Direct Solution of Linear Equations on FPGA-Based Machines

The efficient solution of large systems of linear equations represented by sparse matrices appears in many tasks. LU factorization followed by backward and forward substitutions is widely used for this purpose. Parallel implementations of this computation-intensive process are limited primarily to supercomputers. New generations of Field-Programmable Gate Array (FPGA) technologies enable the im...

متن کامل

High-Performance Linear Algebra Processor using FPGA

With recent advances in FPGA (Field Programmable Gate Array) technology it is now feasible to use these devices to build special purpose processors for floating point intensive applications that arise in scientific computing. FPGA provides programmable hardware that can be used to design custom hardware without the high-cost of traditional hardware design. In this talk we discuss two multi-proc...

متن کامل

FPGA Based Efficient Cholesky Decomposition for Solving Least Square Problem

The paper presents FPGA based design & implementation of Cholesky Decomposition for matrix calculation to solve least square problem. The Cholesky decomposition has no pivoting but the factorization is stable. It also has an advantage that instead of two matrices, only one matrix multiplied by itself. Hence it requires two times less operation. The Cholesky decomposition has been designed & sim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008